The Effects of Datapath Placement and C-Slow Retiming on Three Computational Benchmarks

نویسندگان

  • Nicholas Weaver
  • John Wawrzynek
چکیده

Two important optimizations within the FPGA design process, C-slow retiming and datapath placement, offer significant benefits for designers. Many have advocated and implemented tools to use these techniques in both automatic and semiautomatic manner [1][2][5] but they have not made their way into conventional FPGA toolflo ws. C-slow retiming[3] is a method of accelerating computations that include feedback loops. Instead of having a single instance of the computation, the feedback loop is pipelined so thatC separate instances are all calculated simultaneously. This allows fine grained pipelining to occur even in designs that include feedback loops, such as single round cryptographic implementations or microprocessors. Done properly, it imposes a significant but not imposing latency penalty for single computations while offering huge increases in throughput. Datapath placement is simply constructing the design in a manner that accounts for the higher level data flo ws. This offers several benefits, including improved performance[4], more physically compact designs, shorter wires, and faster place and route times when the FPGA is heavily utilized. Even for designs with less structure which are amenable to simulated annealing, datapath placement may still offer a significant benefit. To clearly demonstrate the importance of these optimizations we have hand-modified three computational benchmarks which represent significant themes within FPGA computation: Rijndael/AES encryption, Smith/Waterman, and a simplified 32-bit microprocessor datapath. All three represent significantly different modes of computation within FPGAs, but all gain significantly from the use of these techniques. The first, Rijndael encryption, uses bit and byte mixing and small table lookups on 128b datapaths to implement encryption. Because this has been adopted as the DES replacement by the US Government, it will undoubtedly be the most widely deployed block cipher over the next twenty years. The second, Smith/Waterman, is a commonly used systolic sequence matching routine for bioinformatics. It relies on 16 bit additions and clearly demonstrates the benefits of specialization. The final benchmark is the datapath for a simplified, multithreaded, 32 bit processor. It uses a radically simplified MIPS instruction set. It is typical of full processor implementations with simplified control. All implementations were targeted toward Xilinx Spartan II series, a Virtex [6] derived FPGA intended for low cost applications. The Spartan II was selected for two reasons: its modern architecture has features needed to make C-slow retiming effective (Embedded memories, LUTs as shift registers) and to demonstrate the capabilities of the current low-cost FPGAs. We are able to achieve >100 MHz operation with throughput speedups of 2x or greater on all three benchmark circuits by applying these optimizations together. All designs were input as schematics using Xilinx Foundation 4.1 with RLOCs used to place almost every block (except for a few control signals which are better handled by simulated annealing). All compilation was performed on a Pentium II 300 with 256 MB of memory running Windows 2000. Timing results are from the static timing analysis, worst case process corner for the -5 speed grade part. The place and route effort was set to maximum for all runs, but no other constraints were imposed on the toolflo w.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Half-buffer retiming and token cages for synchronous elastic circuits

Synchronous elastic circuits borrow the tolerance of computation and communication latencies from the asynchronous design style. The datapath is made elastic by turning registers into elastic buffers and adding a control layer that uses synchronous handshake signals and join/fork controllers. Join elements are the objective of two improvements discussed in this paper. Half-buffer retiming allow...

متن کامل

Minimum-Perturbation Retiming for Delay Optimization

This paper describes a fast retiming algorithm targeting delay while minimizing the number of flip-flops moved. The algorithm can be applied before placement to minimize logic level, or after placement to minimize the critical region annotated with wire delays. Experiments on a suite of industrial benchmarks show that the algorithm improves fMAX by 9% while increasing LUT count by 1% and flip-f...

متن کامل

National Library of Canada Bibliotheque Nationale Ducanada Acquisitions and Bibliographic Services Acquisitions Et Services Bibliographiques Canadit Equivalence Relations of Synchronous Schemes

Synchronous systems are single purpose multiprocessor computing machines, which provide a realistic model of computation capturing the concepts of pipelining, parallelism and interconnection. The synta.x of a synchronous system is specified by the synchronous scheme, which is in fact a directed, Labelled, edge-weighted multigraph. The vertices and their labels represent functional elements (the...

متن کامل

Tight coupling of timing-driven placement and retiming

Retiming is a widely investigated technique for performance optimization. In general, it performs extensive modifications on a circuit netlist, leaving it unclear, whether the achieved performance improvement will still be valid after placement has been performed. This paper presents an approach for integrating retiming into a timing-driven placement environment. The experimental results show t...

متن کامل

The effects of placement position and corm size of saffron (Crocus sativus L.) on stigma and corm yields in Ankara conditions

Background & Aim: Saffron (Crocus sativus L.) formerly was important in Turkey. Saffron cultivation has been decreased and it is now only cultivated in three villages in this country. It is triploid and exclusively propagated in a vegetative way by corms. In Turkey, saffron is traditionally planted by placing corms in rows randomly without grading or sorting; which results in placement...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002